Next-Gen Financial Fraud and Money Laundering Detection Using Real and Simulated Datasets: A Comparative Study of Machine Learning, Deep Learning, and Graph Neural Networks (GNNs)

 

Reshma Rajesh Sawant1, Srijan Kumar2*

1Master’s in Business Analytics, University of Hartford, Connecticut, USA.

2Department Of Mechanical Engineering Bhilai Institute of Technology, Durg, India.

*Corresponding Author E-mail: srijankumarshrivastava@gmail.com

 

ABSTRACT:

This study undertakes a comparative evaluation of ML and DL models across four different datasets for the detection of financial fraud: PaySim, IEEE-CIS, BankSim, and the 2023 Kaggle Credit Card Fraud dataset. Unlike earlier works with limitations of singular datasets and/or singular families of algorithms, this research focuses on hybrid model architectures, cross-dataset generalizability, and real-world trade-offs of implementation. We test different models empirically, including Random Forest, XGBoost, SVM, CNN, LSTM, and hybrid CNN-LSTM architectures, regarding several metrics such as accuracy, recall, F1-score, AUC-ROC, and false positive rates. The result shows that LSTM-and CNN-LSTM-are best suited for temporal datasets (PaySim, IEEE-CIS), while XGBoost and Random Forest are best suited for static datasets (Kaggle, BankSim). This paper, therefore, presents a solid non-overlapping framework that financial institutions could use to adopt AI systems concerning data type, latency requirements, and accuracy needs. Furthermore, considering the escalating danger of money laundering, the paper integrates the approach developed from recent advances in graph-based and temporal deep learning models such as MAGIC and Amatriciana. These models demonstrated powerful performance in modeling financial transaction networks and detecting illicit behavior.

 

KEYWORDS: Financial fraud detection, PaySim, IEEE-CIS, BankSim, credit card fraud, machine learning, deep learning, ensemble methods, hybrid models, anti-money laundering, AMLSim, MAGIC, Amatriciana, graph neural networks, temporal learning.

 

 

1. INTRODUCTION:

As per Association of Certified Fraud Examiners (ACFE 2024), fraud incidents cause a loss of more than $40 billion annually to the global financial industry16. With accelerated digitalization of financial services, especially in mobile transactions and cross-border payments, the attack vectors of cybercriminals expanded. Conventional rule-based systems for fraud detection are good at interpretability and simplicity but lack adaptability to changing fraud patterns13 (Singh & Zhao, 2023).

 

Using various ML approaches and DL advancements, dynamic pattern learning and near real-time implementation of fraud detection are now in place. In their study, West and Bhattacharya (2016) emphasize the innovative model's power that renders multiscalar and adaptive capabilities unavailable to the conventional one14. Carcillo et al. (2021) also stress the relevance of fully automated ML pipelines for financial applications. Nonetheless, a lot of the existing work is restricted to performance on single-source datasets or algorithms4. Hence, this paper evaluates the models on heterogeneous sources of data to confirm generalized performance.

 

Being an act of money laundering involves disguising the origin of illicit funds and is a very challenging act to prosecute apart from a general financial fraud. Recent studies have further proved voices in favor of such models as graph-based approaches, including GNNs or temporal-aware architectures, to detect such schemes. MAGIC (Wójcik, 2025), as well as Amatriciana (Di Gennaro et al., 2025), uses structural and temporal information from the network of financial transactions to improve AML detection17,18. We, therefore, expand the scope of our work by leveraging learnings made relevant by these approaches.

 

Although the present research focuses on detecting financial fraud using data mining and AI, a basic understanding of scalable computing and efficient processor architectures still remains relevant. Previous research dealt with fault-tolerant multistage interconnection networks19, cache associativity in multicore systems20, and dependency management in superscalar pipelining21, laying the foundation for high-performance computing infrastructure required for model training and deployment at scale.

 

Datasets and Preprocessing:

PaySim:

PaySim is considered as the modeling of mobile money transactions and contains over 6.3 million records including normal and fraudulent transaction types. We restrict our evaluation to two types of transactions-prone to fraudulent behavior,"cash-out" and "transfer"10 which are most prone to fraud (Lopez-Rojas & Axelsson, 2016).

 

IEEE-CIS:

Transaction-level data with anonymized features from Vesta Corporation for the Kaggle contest. 590K anonymized transactions with both numerical and categorical features and nice challenges which requires feature engineering6 (IEEE, 2019). The dataset is known for its high-class imbalance and complexity, making it suitable for evaluating fraud detection models.

 

BankSim:

This is a synthetic dataset simulating interbank activity through an agent-based simulator. It simulates interbank activities including customer behavior over several days, capturing realistic transaction patterns and labeling fraudulent transactions. It has in excess of 600,000 transactions labeled as fraudulent, simulating behavioral patterns across days12 (Sebastián et al., 2015).

 

2023 Kaggle Credit Card Fraud:

The modernized version of credit card data from 2016, where features have been anonymized (V1-V30), along with transaction amount and class label. It has more than 300K transactions with a huge class imbalance8 (Kaggle, 2023).

 

All datasets were normalized and split into 80% train set and 20% test set. SMOTE (Chawla et al., 2002) was applied to correct class imbalance5 where necessary, consistent with Pozzolo's recommendations11 (Pozzolo et al., 2015).

 

AMLSim (IBM AML Simulator):

AMLSim is a synthetic dataset generated to simulate account-to-account transfer operations within the banking domain with the necessary labels for illegal and legal transactions. Created by IBM, it serves as a benchmark for Anti-Money Laundering models. It was normalized and divided into 80 percent training and 20 percent testing, with SMOTE being used to counteract the class imbalance.

 

Methodology:

Our candidate models are both traditional ML classifiers and DL architectures for sequential and spatial transaction analysis. During preprocessing, transaction aggregation15 (Whitrow et al., 2009) and feature-engineering strategies (Bahnsen et al., 2016) for classifier optimization were adopted2.

 

We also reviewed advanced models for money laundering detection, such as MAGIC17, an acronym for Multi-Aggregation Graph Isomorphism Convolution, and Amatriciana18, which combines graph-based learning with LSTM encoders for temporal dynamics.

 

a.     Algorithms Compared:

·       Logistic Regression (LR)

·       Random Forest (RF)

·       Support Vector Machine (SVM)

·       XGBoost (XGB)

·       Convolutional Neural Networks (CNN)

·       Long Short-Term Memory Networks (LSTM)

·       Hybrid CNN-LSTM

·       MAGIC (GNN-based graph convolution for link prediction)

·       Amatriciana (Temporal GNN with LSTM aggregation)

 

b.    Evaluation Metrics:

·       Accuracy

·       Precision

·       Recall

·       F1-Score

·       AUC-ROC

·       False Positive Rate (FPR)

 

RESULTS:

Table 1: Accuracy Comparison Across Datasets

Model

PaySim

IEEE-CIS

BankSim

Kaggle 2023

Logistic Reg

93.4%

89.1%

92.0%

91.2%

Random Forest

96.7%

95.3%

97.1%

95.6%

SVM

95.0%

92.4%

94.3%

93.7%

XGBoost

97.8%

96.1%

98.2%

97.5%

CNN

95.2%

94.8%

96.4%

94.5%

LSTM

98.5%

97.3%

94.9%

93.9%

CNN-LSTM

98.9%

97.8%

96.1%

95.1%

 

Figure 1: Accuracy Comparison Bar Chart

 

Table 2: False Positive Rate

Model

Avg FPR (%)

Logistic Reg

4.3

Random Forest

2.6

SVM

3.1

XGBoost

1.7

CNN

2.8

LSTM

2.1

CNN-LSTM

1.6

 

Figure 2: ROC Curve Comparison (PaySim Dataset)

 

Table 3: Training Time Comparison

Model

PaySim (s)

IEEE-CIS (s)

BankSim (s)

Kaggle 2023 (s)

Logistic Reg

6.2

5.8

6.1

5.4

Random Forest

12.5

15.3

14.2

10.1

SVM

18.0

16.9

19.3

17.2

XGBoost

18.9

21.0

20.4

17.5

CNN

60.5

65.2

59.1

62.3

LSTM

75.0

82.1

69.4

66.8

CNN-LSTM

95.3

102.7

89.5

88.0

 

Figure 3: Confusion Matrix for XGBoost (Kaggle Dataset)

 

Table 4: Precision, Recall, and F1-Score Summary

Model

Dataset

Precision

Recall

F1-Score

XGBoost

Kaggle

97.6%

97.3%

97.4%

LSTM

PaySim

98.1%

98.9%

98.5%

RF

BankSim

96.8%

97.5%

97.1%

CNN-LSTM

IEEE-CIS

97.5%

98.2%

97.8%

 

Figure 4: AUC-ROC Scores Bar Chart (PaySim)

 

Table 5: Performance on AMLSim Dataset (Illicit Class)

Model

Accuracy

F1-Score

Precision

Recall

FPR

MAGIC

87.3%

82.6%

90.4%

75.1%

1.9%

Amatriciana

77.3%

76.0%

80.9%

71.5%

2.2%

 

Figure 5: Accuracy Comparison Bar Chart Performance of MAGIC and Amatriciana Models on the AMLSim Dataset (Illicit Class). Metrics include Accuracy, F1-Score, Precision, Recall, and False Positive Rate (FPR). Values sourced from published benchmark evaluations.

 

DISCUSSION:

The results confirm that no one model is always the best in all levels of fraud scenarios. XGBoost performs well on structured feature-rich datasets like BankSim and Kaggle. LSTM and CNN-LSTM rank first in sequential data such as PaySim and IEEE-CIS, further exhibiting strong temporal pattern recognition7 (Jurgovsky et al., 2018).

The deep learning set showed fewer false positives, better generalization for the temporal data that accorded with the theoretical9 framework given by LeCun et al. (2015). Statistical approaches, like those reviewed by Bolton and Hand (2002) for example, still have their place as the systems against which highly specialized systems are contrasted3.

Besides, Ahmed et al. (2016) go on to explain the significance of anomaly detection within networked systems, reinforcing considerations in our banning of sequence models for fraud1. These insights take back the general results of Whitrow et al. (2009) and Bahnsen et al. (2016) regarding the need for complementing temporal15 and aggregated transactional attributes2.

 

More AML-specific literature results are in agreement with these findings. Graph-based methods such as MAGIC17 showed robustness on the AMLSim dataset, granting high precision in finding illicit nodes. The time dimension allowed Amatriciana18 to detect laundering behaviors developing over time. Thus, on one hand, we can observe that both sets of results stress the incorporation of graph structures; on the other hand, they also stress that time dependencies need to be considered as well to better model some of the more complex financial crimes such as money laundering.

 

CONCLUSION:

With a more robust cross-dataset comparison with four datasets being employed, one can observe the importance of the model selected in view of data type and performance trade-offs. Future work may therefore investigate federated learning and privacy-preserving frameworks for fraud detection able to consider collaborative fraud prevention. Expanding fraud detection systems to include graph and time-aware architectures, such as MAGIC17 and Amatriciana18, offers promising future directions for tackling complex threats like money laundering.

 

With the current AML techniques, data mining tools, more often the ML and DL-based ones, have proved to be immensely useful. The tools have to sift through gigantic financial transaction data that have washing signals with such structuring, layering, and circular transactions as are present in the data. The issue with these systems is the enormous number of false positives due to rule-based rigid generation that requires manual intervention; however, AI-based systems learn from data of past incidents and can detect and adapt to changes in criminal modus operandi. Several AI systems are commercially available today and adopted by banking establishments across the world. HSBC utilizes AI to detect transactions by minimizing false alerts22. Danske Bank replaced the legacy AML system, called cystine, with a deep learning system for greater detection accuracy and operational workload reduction of 50%23. ING Bank uses graph analytics to map entity relationships as well as to uncover hidden fraud networks24, JPMorgan Chase uses AI modeling to track trillions of transaction flows and bring laundering activity into the spotlight almost in real-time25. All these real cases and applications show the possibilities of merger of AI and graph-based data mining in fraud detection and AML verification.

 

REFERENCES:

1.      Ahmed M, Mahmood AN, Hu J. A survey of network anomaly detection techniques. J Netw Comput Appl. 2016; 60: 19–31. doi:10.1016/j.jnca.2015.11.016

2.      Bahnsen AC, Aouada D, Stojanovic J, Ottersten B. Feature engineering strategies for credit card fraud detection. Expert Syst Appl. 2016; 51: 134–142. doi:10.1016/j.eswa.2015.12.030

3.      Bolton RJ, Hand DJ. Statistical fraud detection: A review. Stat Sci. 2002; 17(3): 235–255. doi:10.1214/ss/1042727940

4.      Carcillo F, Le Borgne YA, Caelen O, Bontempi G. Fully Automated Fraud Detection: A Case Study. Expert Syst Appl. 2021; 150: 113290. doi:10.1016/j.eswa.2020.113290

5.      Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. J Artif Intell Res. 2002; 16: 321–357. doi:10.1613/jair.953

6.      IEEE-CIS Fraud Detection Dataset. Kaggle. 2019. Available from: https://www.kaggle.com/competitions/ieee-fraud-detection

7.      Jurgovsky J, Granitzer M, Ziegler K, Calabretto S, Portier PE, He-Guelton L, Caelen O. Sequence classification for credit-card fraud detection. Expert Syst Appl. 2018; 100: 234–245. doi:10.1016/j.eswa.2018.01.037

8.      Kaggle Credit Card Fraud Dataset. Kaggle. 2023. Available from: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

9.      LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521(7553): 436–444. doi:10.1038/nature14539

10.   Lopez-Rojas E, Axelsson S. PaySim: A Financial Mobile Money Simulator for Fraud Detection. 28th Eur Model Simul Symp. 2016: 249–255. Available from: https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1050337

11.   Pozzolo AD, Caelen O, Johnson RA, Bontempi G. Calibrating probability with undersampling for unbalanced classification. IEEE Symp Ser Comput Intell. 2015: 159–166. doi:10.1109/SSCI.2015.33

12.   Sebastián S, Granados J, Rodríguez A. BankSim: Agent-based Simulator for Bank Transactions. Proc 27th Eur Model Simul Symp. 2015: 249–254. Available from: https://www.scitepress.org/Papers/2015/55718/55718.pdf

13.   Singh R, Zhao Y. A Survey on Real-Time Fraud Detection Using Machine Learning. J Financ Technol. 2023; 11(2): 145–161. doi:10.2139/ssrn.4366820

14.   West J, Bhattacharya M. Intelligent financial fraud detection: A comprehensive review. Comput Secur. 2016; 57: 47–66. doi:10.1016/j.cose.2015.09.005

15.   Whitrow C, Hand DJ, Juszczak P, Weston D, Adams NM. Transaction aggregation as a strategy for credit card fraud detection. Data Min Knowl Discov. 2009; 18(1): 30–55. doi:10.1007/s10618-008-0116-z

16.   ACFE. Report to the Nations: 2024 Global Study on Occupational Fraud and Abuse. Association of Certified Fraud Examiners; 2024. Available from: https://www.acfe.com/report-to-the-nations/2024/

17.   Wójcik F. Money Laundering Detection with Multi-Aggregation Custom Edge GIN. J Data Sci. 2025; 23(2): 145–159. doi:10.48550/arXiv.2506.00654

18.   Di Gennaro M, Panebianco F, Pianta M, Zanero S, Carminati M. Amatriciana: Exploiting Temporal GNNs for Robust and Efficient Money Laundering Detection. arXiv. 2025; 2506.00654v1. doi:10.48550/arXiv.2506.00654

19.   S. Kumar, “Mathematical Modelling and Simulation of a Buffered Fault Tolerant Double Tree Network,” in 15th International Conference on Advanced Computing and Communications (ADCOM 2007), IEEE, Dec. 2007; pp. 422–433. doi: 10.1109/ADCOM.2007.62.

20.   D. Ramtake, N. Singh, S. Kumar, and V. K. Patle, “Cache Associativity Analysis of Multicore Systems,” in 2020 International Conference on Computer Science, Engineering and Applications (ICCSEA), IEEE, Mar. 2020; pp. 1–4. doi: 10.1109/ICCSEA49143.2020.9132884.

21.   R. Patel and S. Kumar, “Visualizing Effect of Dependency in Superscalar Pipelining,” in 2018 4th International Conference on Recent Advances in Information Technology (RAIT), IEEE, Mar. 2018; pp. 1–5. doi: 10.1109/RAIT.2018.8388992.

22.   HSBC Holdings. (2020). HSBC improves AML with machine learning. Retrieved from https://www.hsbc.com/news-and-media

23.   Danske Bank. (2021). Using deep learning to fight money laundering. Retrieved from https://danskebank.com/news-and-insights

24.   ING Bank. (2022). AI and graph analytics to detect financial crime. Retrieved from https://www.ing.com/Newsroom

25.   JPMorgan Chase. (2023). AI-driven fraud and AML risk monitoring at scale. Retrieved from https://www.jpmorganchase.com/technology

 

 

Received on 16.07.2025     Revised on 08.08.2025

Accepted on 11.09.2025     Published on 20.09.2025

Available online from September 30, 2025

Research J. Engineering and Tech. 2025; 16(3):108-114.

DOI: 10.52711/2321-581X.2025.00010

©A and V Publications All right reserved

 

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Creative Commons License.